KeyWorld: Extracting Keywords from a Document as a Small World
نویسندگان
چکیده
The small world topology is known widespread in biological, social and man-made systems. This paper shows that the small world structure also exists in documents, such as papers. A document is represented by a network; the nodes represent terms, and the edges represent the co-occurrence of terms. This network is shown to have the characteristics of being small world, i.e., highly clustered and short path length. Based on the topology, we develop an indexing system called KeyWorld, which extract important terms by measuring their contribution to the graph being small world.
منابع مشابه
Improving Precision of Keywords Extracted From Persian Text Using Word2Vec Algorithm
Keywords can present the main concepts of the text without human intervention according to the model. Keywords are important vocabulary words that describe the text and play a very important role in accurate and fast understanding of the content. The purpose of extracting keywords is to identify the subject of the text and the main content of the text in the shortest time. Keyword extraction pl...
متن کاملExtraction of Representative Keywords Considering Co-occurrence in Positive Documents
In linear text classification, user feedback is usually used to tune up the representative keywords (RK) for a certain class. Despite some algorithms (e.g. Rocchio) deal well with user positive and negative feedback to adjust the RKs, few researches have investigated how to adjust RKs only based on a small positive responses which is a popular case in the real-world application (e.g. users tend...
متن کاملA Model for Extracting Keywords of Document Using Term Frequency and Distribution
In information retrieval systems, it is very important that indexing is defined very well by appropriate terms about documents. In this paper, we propose a simple retrieval model based on terms distribution characteristics besides term frequency in documents. We define the keywords distribution characteristics using a statistics, standard deviation. We can extract document keywords that term fr...
متن کاملKeyword Extraction from a Single Document Using Centrality Measures
Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two ...
متن کاملDesign and Analysis of the Performance of Clustering of Conversation Documents Based on Keyword Extraction Mechanism
Now a day’s data mining has become one of the most fascinating domains in each and every field like medical, shopping, business, MNC companies, information technology and a lot more. As we all know that the main goal of data mining is to extract the valuable information from large data sets, in order to retrieve the desired result as an output. In this thesis we mainly try to extract the large ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001